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1. Introduction 
1.1. Background 


Contingency tables are an important data structure in statistics for representing 
the joint empirical distribution of multivariate data, and are useful for testing 
properties such as independence between the rows and columns [36] and simi¬ 
larity between two rows or two columns [38, 32] . 

Such statistical tests typically involve defining a test statistic and comparing 
its observed value to its distribution under the null hypothesis, that all (r, c)- 
contingency tables are equally likely. The null distribution of such a statistic 
is often impossible to study analytically, but can be approximated by generat¬ 
ing contingency tables uniformly at random. In this paper, we introduce new 
sampling algorithms for (r, c)-contingency tables. 

The most popular approach in the literature for the random generation of con¬ 
tingency tables is Markov Chain Monte Carlo (MCMC) [21, 22, 23, 28, 35], in 
which one starts with a contingency table and randomly changes a small number 
of entries in a way that does not affect the row sums and column sums, thereby 
obtaining a slightly different contingency table. After sufficiently many moves, 
the new table will be almost independent of the starting table; repeating this 
process yields almost uniform samples from the set of (r, c)-contingency tables. 
The downside to this approach is that the number of steps one needs to wait 
can be quite large, see for example [14], and by the nature of MCMC, one must 
prescribe this number of steps before starting, so the runtime is determined not 
by the minimum number of steps required but the minimum provable number 
of steps required. 

An alternative approach is Sequential Importance Sampling (SIS) [17, 21, 20, 
50], where one samples from a distribution with computable deviation from 
uniformity, and weights samples by the inverse of their probability of occurring, 
to obtain unbiased estimates of any test statistic. Such techniques have proven 
to be quite fast, but the non-uniformity can present a problem [14]: depending 
on the parameters (r, c), the sample can be exponentially far from uniform, and 
thus the simulated distribution of the test statistic can be very different from 
the actual distribution despite being unbiased. 

There are also extensive results pertaining to counting the number of (r, c)- 
contingency tables, see for example [8, 12, 46, 3, 24, 4, 5, 37, 7, 11, 10, 27]. 
While our exact sampling algorithms benefit from this analysis, the approximate 
sampling algorithm in Algorithm 2 does not require these formulas. 

The special case of 2 x n contingency tables has received particular attention in 
the literature, as it is relatively simple while still being interesting—many statis¬ 
tical applications of contingency tables involve an axis with only two categories 
(male/female, test/control, etc). An asymptotically uniform MCMC algorithm 
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is presented in [32]. In addition, [38] adapted the same chain using coupling 
from the past to obtain an exactly uniform sampling algorithm. We describe an 
exactly uniform sampling algorithm in Section 2.3 which has an overall lower 
runtime cost than both approaches and requires no complicated rejection func¬ 
tions nor lookup tables. 


1.2. Approach 

The main tools we will use in this paper are rejection sampling [49] and proba¬ 
bilistic divide-and-conquer (PDC) [1]. 

Rejection sampling is a powerful method by which one obtains random variates 
of a given distribution by sampling from a related distribution, and rejecting 
observations with probability in proportion to their likelihood of appearing in 
the target distribution [49]. For many cases of interest, the probability of rejec¬ 
tion is in {0,1}, in which case we sample repeatedly until we obtain a sample 
that lies inside the target set; this is the idea behind the exact Boltzmann sam¬ 
pler [30, 29]. 

In other applications of rejection sampling, the probability of rejection is in the 
interval [0,1], and depends on the outcome observed. Several examples of this 
type of rejection sampling are presented in [1, Section 3.3]; see also [25]. 

PDC is an exact sampling technique which appropriately pieces together samples 
from conditional distributions. The setup is as follows. Consider a sample space 
consisting of a cartesian product A x B of two probability spaces. We assume 
that the set of objects we wish to sample from can be expressed as the collection 
of pairs £((A, B ) | (A, B ) £ I?) for some E C A x B, where 

A £ A, B £ B have given distributions, (1) 

A,B are independent, (2) 

and either 

(1) E is a measurable event of positive probability; or, 

(2) (i) There is some random variable T on A x B which is either discrete or 

absolutely continuous with bounded density such that E = {T = k} 
for some k £ range(T), and 

(ii) For each a £ A, there is some random variable T a on B which is 
either discrete or absolutely continuous with bounded density such 
that {b £ B : (a, b) £ E} = {T a = k a } for some k a £ range(T 0 ). 

We then sample from ( (A, B) | {A 1 B) £ E ) in two steps. 

1. Generate sample from (A | ( A,B ) £ E), call it x. 

2. Generate sample from (B \ (x,B) £ E), call it y. 
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The PDC lemmas, [1, Lemma 2.1] in case (1), and [25, Lemma 2.1 and Lemma 2.2] 
in case (2), imply that the pair (x, y) is an exact sample from (( A , B) | ( A , B ) £ E) 

In Algorithm 1, we utilize the well-known fact that for a geometric random 
variable Z with parameter 1 — q, the random variable rj(q) := 1 (Z is odd) is 
Bernoulli with parameter which is independent of (Z — rf)/ 2, which is 

geometrically distributed with parameter 1 — q 2 . In our first case of interest, 
nonnegative integer-valued contingency tables, q is of the form qj = Cj/(m + Cj), 
where Cj is the j-th column sum, j = 1, 2,..., n; see lemmas 3.1 and 3.6. 

The use of rejection sampling is optimized by picking a distribution which is 
close to the target distribution. The random variable q(qj) serves as an a priori 
estimate for the true distribution of the least significant bit of an entry under 
the conditional distribution. Decomposing a geometric random variable by its 
bits appears to be a relatively recent technique for random sampling. It was 
effectively utilized in [1] for the random sampling of integer partitions of a 
fixed size n, i.e., an exact Boltzmann sampler, with 0(1) expected number of 
rejections. 

After the least significant bit of each entry of the table is sampled, we return 
to the same sampling problem with reduced row sums and column sums. More 
importantly, after each step, we choose new parameters qj : j = 1,2,..., n, based 
on these new column sums, which tilts the distribution more in favor of this new 
table. If we had sampled the geometric random variables directly, it would be 
equivalent to choosing one value of q for the entire algorithm, and sampling from 
Bernoulli random variables with parameters q/( 1 + q), q 2 /( 1 + q 2 ), q 4 /( 1 + q 4 ), 
etc. Using this new approach, we are sampling from Bernoulli random variables 
with parameters q/(l + q), q'/(1 + q'), q"/(1 + q"), etc., where q ', q", etc., are 
chosen after each iteration, and more effectively target the current set of row 
sums and column sums. 

Remark 1.1. One might define a new type of random variable, say, a quasi- 
geometric random variable, denoted by Q , which is defined as 

Q{q, q', q",...) = Bern (+ 2 Bern (l^) + 4 Bern (t+7 7 ) + ''' ’ 

where all of the Bernoulli random variables are mutually independent. In our 
case, we choose q' after an iteration of the algorithm completes with q. Subse¬ 
quently, we use q" after an iteration of the algorithm completes with q ', etc. This 
quasi-geometric random variable has more degrees of freedom than the usual ge¬ 
ometric random variable, although we certainly lose other nice properties which 
are unique to geometric random variables. 

Remark 1.2. In [8], a question is posed to obtain the asymptotic joint distribu¬ 
tion of a small subset of entries in a random contingency table, and an example 
of a specific table is presented which shows that using independent geometric 
random variables as the marginal distributions of small subsets of this joint den¬ 
sity leads to joint distributions which are not indicative of the uniform measure 
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over the set of such contingency tables. We surmise it may be more fruitful to 
consider the joint distribution of the r least significant bits of the usual geomet¬ 
ric distribution; or, alternatively, to consider a quasi-geometric random variable 
defined in Remark 1.1 with given parameters q , q ', q", etc. 

In addition to sampling from nonnegative integer-valued tables, one may also 
wish to sample from nonnegative real-valued tables with real-valued row sums 
and column sums. For a real-valued contingency table, the role of geometric 
random variables is replaced with exponential random variables. Instead of sam¬ 
pling from the smallest bit of each entry, we instead sample the fractional parts 
first, and what remains is an integer-valued table with integer-valued row sums 
and column sums. 

To obtain a good candidate distribution for the fractional part of a given entry, 
we utilize two well-known facts summarized in the lemma below. For real x, {x} 
denotes the fractional part of x, and \x\ denotes the integer part of x, so that 

x ~ Y x \ + i x }- 

Lemma 1.3. Let Y be an exponentially distributed random variable with pa¬ 
rameter A > 0, then: 

• the integer part, \Y J, and the fractional part, { Y }, are independent [f 9 , 

48]; 

• |YJ is geometrically distributed with parameter 1 — e _A , and {T} has 
density f\(x) = Xe~ Xx /(I — e -A ), 0 < x < 1. 

In this case, the recommended approach is the following: 

1. Sample the fractional part of the entries of the table first, according to 
the conditional distribution; 

2. Sample the remaining integer part of the table. 

If an exact sample of each of the items above can be obtained, then an appli¬ 
cation of PDC implies that the sum of the entries of the two tables has the 
uniform distribution over nonnegative real-valued tables. 

For 2xn tables, Algorithm 3 is particularly elegant, as it is an exact sampling 
algorithm which does not require the computation of any rejection functions nor 
lookup tables, and instead exploits a property of conditional distributions. We 
also present several alternative exact sampling PDC algorithms for tables with 
entries with given marginal distributions, subject to relatively minor conditions. 

Finally, by dropping the requirement that an algorithm should produce a sample 
uniformly at random from a given set of contingency tables, we provide an 
approximate sampling algorithm in Algorithm 2 for which any given bit is in 
its incorrect proportion by at most m + 2, where m is the number of rows; see 
Lemma 2.8. An approximate sampling algorithm for the fractional part of the 
entries of a given table is given in Algorithm 5. 
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The paper is organized as follows. Section 2 contains the main algorithms for the 
random sampling of (r, c)-contingency tables, including an approximate sam¬ 
pling algorithm which does not rely on any enumeration formulas. Section 3 
contains relevant properties of contingency tables. Section 4 contains the proof 
that Algorithm 1 is uniform over all tables. Section 5 discusses the PDC algo¬ 
rithm in the special case when there are exactly 2 rows. Section 6 formulates 
similar PDC algorithms for a joint distribution with given marginal probability 
distributions under mild restrictions. 


2. Main Algorithms 
2.1. Integer-valued tables 

Our first algorithm is Algorithm 1 below, which uniformly samples from the set 
of nonnegative integer-valued contingency tables with any given size and row 
sums and column sums. The algorithm is recursive (though our implementation 
is iterative). Each step of the algorithm samples the least significant bit of a 
single entry in the table, in proportion to its prevalence under the conditional 
distribution, via rejection sampling, and once the least significant bit of all entries 
in the table have been sampled, all of the row sums and column sums are reduced 
by at least a factor of two. The algorithm repeats with the next significant bit, 
etc., until all remaining row sums and column sums are 0 . 

For a given set of row sums r = (n,..., r m ) and column sums c = (ci,..., c n ), 
let E(r, c) denote the number of (r, c)-contingency tables. Let O denote an to x n 
matrix with entries in {0,1}. For any set of row sums and column sums (r, c), let 
E (r, c, O) denote the number of (r, c)-contingency tables with entry (i, j) forced 
to be even if the (i,j)th entry of O is 1, and no restriction otherwise. Let Oij 
denote the matrix which has entries with value 1 in the first j — 1 columns, and 
entries with value 1 in the first i rows of column j, and entries with value 0 
otherwise. 

Define for 1 < i < m — 2,1 < j < n — 2, 



(3) 


When j = n— 1, we have a slightly different expression. Let qj := , and let 

Uj := qj 1 = 1 + ^ 7 . Let b(k) be such that ri — k — b(k) is even. Then we define 


for 1 < * < to — 2 


f(i,n - 1 ,k,r,c) : 


(4) 
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/ {... ,n — k — b(k ),...), \ 

£ . ,,Cn-i — k,Cn-b(k)), ■ y 

V Oi. n J 


b(k) 


(...,n - 1 - 6(1),...), 

(■• 

Oi. 


£ I (. .. ,C„_i -l,Cn~ 6(1)), • 2/n ^ + £ ( (. . . ,C n -l,C„ - 6(0)), ) 


(...,ri -6(0),...), 

(•■ 

Oi, 


'x,n / \ / 

When i = m — 1, let u(fc) be such that Cj — k — v(k ) is even. Then we define for 
1 < j < n - 2, 

f(m — l,j, k,r,c) := (5) 

(... ,r m _i -k,r m - v(k)), 


-k-v(k),...), 

Or, 


Vi 


(k) 


'm,j 


(•..,^* 771—1 1 ^( 1 )), 

(. . . , Cj — 1 — u(l), • • .), 
Orn.i 


• y/ ) + £ | (• ..,cj-u(o),...), 
Om.i 


(... - u(0)), 


• Vj 


(o) 


Finally, when * = m — 1 and j = n — 1, let u(fc) be such that c„_i — k — v(k) is 
even, let 7 (k) be such that r m _i — k — 7 (fc) is even, and let b(k) be such that 
c n — 7 — 6(/c) is even. Then we define 


(..., r m _i - k - 7 (fc), r m - u(fc) - 6 (fc)), 

A=T,\ (...,c n - 1 -k-v(k),c n -'y(k)-b(k)), ) ■ yl^l ^ (fc)+6(fe) 

@m.n 


(...,r m _i-l- 7 (l),r m -v(l)- 6 (l)), 

B = £| (..., c„_i — fc — t>(l), c„ — 7 ( 1 ) — 6 ( 1 )), I • £j?b( (1)+&(1) , 

(...,r m _i- 7 ( 0 ),r m -v( 0 )- 6 ( 0 )), 
c = £ I (• ■ ■ , - «( 0 ), c n - 7(0) - 6 ( 0 )), ] • ^ (0)+b(0) 

Om.n 


f(m- l,n — l,A:,r,c) := 


A 


B + C 


( 6 ) 


The algorithm itself is then simple and straightforward. 
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Algorithm 1 Generation of uniformly random (r, c)-contingency table 

1: M 4— max (max, r t: . max^ Cj). 

2: t 4— m X n table with all 0 entries. 

3: for 6 = 0,1,..., ["log 2 (M)"| do 

4: Let an denote any permutation such that an o r is in increasing order. 

5: Let a@ denote any permutation such that a@ o c is in increasing order. 

6: r 4— an o r. 

7: c 4— ac o c. 

8 : for j = 1 ,..., n — 1 do 

9: for is 1,_, m — 1 do 

10: if U < then 

11 : £ i,j 4— 0 

12 : else 

13: €i,j t— 1 

14: end if 

15: r; 4- r; — e^j. 

16: Cj 4— Cj — £ i,j ■ 

17: end for 

18: £ m,j 4- Cj mod 2 

19. Cj 4 Cj 6 m ,7 • 

20: c,- 4- Cj /2 

21: Cm 4 r m 6 m j • 

22 : end for 

23: for i = 1, ..., m do 

24: Ei tTl 4— rj mod 2 

25: rj 4- rt — e iin 

26: r; 4- rj/2 

27. Ctj 4 Cfi 6 i j7i 

28: end for 

29: Cn 4— Cnf 2. 

30: e 4— rr^ 1 o e (Apply permutation to rows) 

31: £4— a^ 1 o e (Apply permutation to columns) 

32: t 4- t + 2 6 e. 

33: end for 


To summarize Algorithm 1: sample the least significant bit of each entry in 
the table by its proportion under the conditional distribution, one by one, and 
combine them with the previously sampled bits using probabilistic divide-and- 
conquer (PDC) [1], until all entries of the table have had their least significant bit 
sampled. The next step is the observation that, conditional on having observed 
the least significant bits of the table, the remaining part of each entry is even, 
with even row sums and column sums, and so we can divide each of the entries 
by two, as well as the row sums and column sums, and we have a repeat of the 
same problem with reduced row sums and column sums. The final step at each 
iteration is an application of PDC, which implies that the current iteration can 
be combined with previous iterations as we have done in Line 32. 

Remark 2.1. The cost to evaluate / at each iteration, or more precisely, the 
cost to decide Line 10 , is currently the main cost of the algorithm, which requires 
on average the leading two bits of the evaluation of /, see [40] . Fortunately, we 
do not need to evaluate the quantities exactly, and in fact typically we just need 
a few of the most significant bits. 
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Remark 2.2. Note that the random sampling of the least significant bits of 
the table at each iteration is not equivalent to the random sampling of binary 
contingency tables. The task is considerably easier in general, since we do not 
have to obtain a fixed target at each iteration. 

Theorem 2.3. Algorithm 1 requires an expected 0(mn log (M)) random bits, 
where M is the largest row sum or column sum. 

Remark 2.4. Theorem 2.3 is a statement about the inherent amount of ran¬ 
domness in the sampling algorithm, and not a complete runtime cost of Algo¬ 
rithm 1, which requires the computation of non-random quantities for which 
at present no polynomial-time algorithm is known. We note, however, that it 
is universal with respect to all possible row sums and column sums; in other 
words, if you could count, Algorithm 1 is an explicit, asymptotically efficient 
sampling algorithm. 

Remark 2.5. Assuming Conjecture 2.6 below, Algorithm 1 is a simple, explicit 
polynomial-time algorithm for the uniform generation of to x n (r, c)-contingency 
tables. It samples the least significant bit of each entry of the table one at a 
time, which is where the factor of 0(mn log (M)) is derived. It does not require 
the exact calculation for the number of tables in advance, but rather requires 
that the leading r bits of related tables can be computed exactly in time o( 2 r ) 
times a polynomial in to and n and logarithmically in M, where M is the largest 
row sum or column sum. 

Conjecture 2.6. Let O denote an to x n matrix with entries in {0,1}. For 
any set of row sums and column sums (r, c), let £(r, c, O) denote the number 
of (r, c)- contingency tables with entry {i,j ) forced to be even if the (i,j)th entry 
of O is 1, and no restriction otherwise. Let M denote the largest row sum or 
column sum. Then for any O, the leading r bits o/E(r, c, O) can be computed 
in o(2 r m a n^ log 7 (M)) time, for some absolute constants a,(3,"f > 0. 

Remark 2.7. An asymptotic formula for a large class of (r, c)-contingency 
tables is given in [10] which can be computed in time 0(m 2 n 2 ), but it is unclear 
whether the bounds can be made quantitative, or whether the approach can 
be extended to provide an asymptotic expansion. An asymptotic expansion is 
given in [19, Theorem 5] for binary contingency tables with equal row sums and 
equal column sums which is computable in polynomial time, although because 
it is based on Edgeworth expansions it is not guaranteed to converge as more 
and more terms are summed. 

An equivalent formulation of the rejection function in Algorithm 1 is given by 
a joint distribution of random variables, which we now describe; see also [10]. 
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Geo(g) 
NB(ro, q) 

U 

Bern(p) 

vUsi^Cj) 

q 

Co 


Geometric distribution with probability of success 1 — q, for 0 < 
q < 1, with P (Geo (q) = k) = (1 — q)q k , k = 0,1, 2,_ 

Negative binomial distribution with parameters m and 1 — q, given 
by the sum of m independent Geo (q) random variables, with 



Uniform distribution on [0,1]. We will also use U in the context 
of a random variable; U should be considered independent of all 
other random variables, including other instances of U. 

Bernoulli distribution with probability of success p. Similarly to 
U, we will also use it as a random variable. 

Geo(g) random variables which are independent for distinct pairs 

1 < * < m, 1 < j < n. 

Random variables which have distribution 



and are independent of all other random variables £,i,e(q) for £ ^ j. 
Random variables which have distribution 



and are independent of all other random variables £i,e(q) for £ ^ j. 
Random variables which have distribution 



and are independent of all other random variables £,i/{q) for £ ^ j- 
Random variables which have distribution 



and are independent of all other random variables ^(g) for £ ^ j. 
The vector (gi,..., q n ), where 0 < qt < 1 for all i = 1,..., n. 

(£z,l; 2i ■ ■ • ? £,i,n ) for i = 1; 2, ... , TYl. 

= (£i,U £2 ,j, ■ ■ ■, £,m,j) for j = 1, 2,..., n. 


Geo (g) 
NB(to, q) 

u 

Bern(p) 

Ci,j{Q, c o) 

2 t"j(<l, c o) 

VUsi^Cj) 

2 r l"j, s {<L c o) 

q 

Co 


imsart-generic ver. 2014/10/16 file: ctables.tex date: March 1, 2016 






S. DeSalvo and J. Y. Zhao/Contingency Tables via PDC 


11 


The rejection probability is then proportional to 
P(ei,i =k\E) 


P(ei,i = k) 


= ¥(E\e 1A = k) 

( 2 £i,i(gi) + ElE^dfe) = c i — k,, 2^ 1>1 (gf) + Ej=2 
C 2 = C 2 , R2 = 1~2 


\ C n c n , 

R2 = r 2 


Rm — I’m 


^ 2^1,1,1(91^1) + E"=2^i,j(9i> c i) =n-k 


V 


I^rri — 'Y'm 

( 

m 

2 &,i(9i) +^ 2 ^i,i{Qi) =ci - k 


Ci = Ci \ 

ei,i = ft 
C2 = C2 


Cn — l ' n J 


\ 


i =2 


£1,1 = fc \ 
C 2 = C2 


Cn — Cn / 


^ 2i?p 1 , 1 (9i,c 1 ) + E"=2 ^ij(9i,ci) = n-k 


oc 


#2 = r 2 

Rm — ^ m 

X P ( 26 , 1 ( 9 ?) + X]^.l(9l) = Cl - ft) • 


V 


Ci = Cl \ 
£i,i = k 

C 2 = C2 

Cn — C n J 


The last expression can be substituted for the rejection function / when i = 
j = 1, i.e., we have just shown that 


/(l, 1) k,r, c) oc 


( 771,1,1(91, ci) +£?=2 £>,<*) =n~ k ^ 

^ 2 , 1 , 1 ( 91 , Cl) +E"=2?2,f(9^c^) =r 2 


V »7m, 1 , 1 ( 91 , Cl) + E”=2 CmA^’ =r ” 
x P | 2£j t j(qj) + 'y^ £,jj ( 9j ) = c j ~ k] ■ 


i =2 


(7) 


/ 


Algorithm 1 samples (ei,i|-£7), (e2,i|-B, £1,1), (e3,i\E, ei,i, £2,1), etc., until the en¬ 
tire first column is sampled. Then it starts with the top entry of the second 
column and samples the least significant bits from top to bottom according to 
the conditional distribution. Generalizing equation (7), we have 


ri - k, 
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+ £/=j+l 


= D 


nZi^AQlce) 

+ r ?2,j,ife. c t) 

+ EL, + i 

£2 j(qe,ce) 

= r 2 

c) oc P 



+ E/=j+l 

Ci-i,j{qe,ce) 

= n-i 

nZl^Avlce) 

+Vi,j,i(qj, c j ) 

+ £.%•+! 

€i,j 0?c,q) 

= n - 



Cj) 

+ E”=j+i 


= n+i 


{ Eti2Cafe 2 ^) 

+V'ZnJ,i te’C?) 

+ £/=j+l 


= I’m 





(8) 



xP(^2^(g|) + 

m 

&Aqe) = c i 

- fc V 




t—1 (,=i+1 


Note that X^=i is the sum of i i.i.d. geometric random variables, and is 

hence a negative binomial distribution with parameters i and gj. We thus have 
the sum of two independent negative binomial distributions: 

( i m \ 

5Z = c o~ k \ = P ( 2NB (b Qj) + NB(m - i, qj) = Cj - k ) . 

t—\ f=i+i / 

We note that each random variable in the probability above has an explicitly 
computable and simple probability mass function, which we write below. 

NB(m — 1, q){cj — k} 


' Cj)=k) =¥ (&j(<?) = k) ■ 


(to— l)+(cj— k) — 1 

Co /c 


) 


NB(m, q){cj} 


k = 0,1,..., Cj. 


( 9 ) 


CT 1 ) 

Denote by eij the observed value of the parity bit of £,;,j, i = 1,2,..., to, j = 
1,2,..., n. Let c' = —— ^ =1 C ‘' J . We have 

P(2^(<7,c')=fc) = P(2^-(g 2 ) = fc) 


2 NB(to — l,g 2 ) {c' — |} 


NB(to, q 2 ) {c'} 


( 


(m-l) + (c' -|)-1 


/m+c'-lN 

V C ' ' 

Let c" = Cj — i j = 1,2,..., n. We have 


k = 0,2,4,..., 2c'. 


( 10 ) 


P(2< jiS ( ? ,c") = fc)=P(2^(5 2 ) = fc)) 


P(2 NB(s — 1, q 2 ) + NB(m — s,q) = c" — k) 
P(2 NB(s, q 2 ) + NB(m — s,q) = c") 

( 11 ) 
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for fc = 0, 2,..., 2 \_^-\ . 

c ") =k)= P 

for k = 0,1,..., c". 


k)) 


P(2 NB(s, q 2 ) + NB (m — s — 1, q) = c” — k ) 
P(2 NB(s, q 2 ) + NB(m -s,q) = c") ’ 

( 12 ) 


I!.#. Approximate sampling of integer-valued tables 

If an approximate sampling algorithm for (r, c)-contingency tables is deemed 
acceptable, Lemma 2.8 below demonstrates that the least significant bit of entry 
(i. j) is closely distributed as ^Bern ), with qj = m c f c . , and so one could 

sample the least significant bits of the table one at a time according to this 
unconditional distribution, without applying any other rejections other than 
the parity constraints, and then apply the recursion until all row sums and 
column sums are 0. The following lemma shows that there is a quantitative 
limit to the bias introduced by this and related approaches. 

Lemma 2.8. Any algorithm which uses ^Bern ^ ) J , where qj = m + c , 

as the surrogate distribution for (eij\E) in rejection sampling, assuming each 
outcome in {0,1} has a positive probability of occurring, accepts a bit with an 
incorrect proportion bounded by at most m + 2, where m is the number of rows. 

Proof. We consider the worst case example, which is mild since there are only 
two possible outcomes for each entry in each iteration of the algorithm. 

Since we are normalizing by the max over the two states, at least one of the 
states is accepted with probability 1, and so the total number of rejections is 
bounded from above by the wait time until this state is generated. Since the 
random variable generating the bit is Bernoulli with parameter j-jfp , we have 


P ^Bern (j/— j 



m + c 7 - 1 

- — > —• 

m + 2 Cj 2 


p ( B ™ (if^) 



Cj 1 

- -— > -. 

m + 2 Cj m + 2 


Thus, at worst we accept a bit with proportion m + 2 times more likely than 
the other state. □ 


We suggest a slightly more involved approach, which is to treat the row sum 
conditions as essentially independent, and reject each bit generated as if it was 
independent of all other rows. Algorithm 2 below uses Equation (8) and assumes 
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that the rows are independent, which greatly simplifies the rejection probability 
formula. A rejection function is then given by 


F(i,j,m,n, q, r, c,k,e) 


'j -1 


P F {<$. i c 3-> e ) + + 55 — r * — ^ 


, £=1 


£=j+i 


2 &A<ij) + 55 = c j - k ) > fc e {0, l}. 

Vf=l £=i+l / 


Note that the first term is a probability over a sum of independent random 
variables, and as such can be computed using convolutions in time 0(M 2 ) or 
fast Fourier transforms in time 0(n M log M). Further speedups are possible 
since only the first few bits of the function are needed on average. 


Algorithm 2 Generation of approximately-uniformly random (r, c)-contingency 
table 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 


M <— max (max, r,. max, Cj). 
t <r- m X n table with all 0 entries, 
for b = 0,1,..., [log 2 (M)] do 

Let an denote any permutation such that an o r is in increasing order. 
Let ac denote any permutation such that a@ o c is in increasing order. 
r <- a R or. 
c <— aQ o c. 

for j = 1,..., n — 1 do 
for i = 1,... , m — 1 do 

<h «- Cj/{m + Cj). 

tij Bern(<jj/(1 + qj )). 


if U > 


q ,,Cj ,£i,j ) 

max£ e { 0 ,l} F(i,j,m,n,ci, r i ,Cj ,£) 

goto Line 11. 
end if 


then 


n 


r i - eij. 
Cj * c j — e i,j • 

end for 

e m,j 4— Cj mod 2 
Cj 4 Cj Cm,j • 

C j c j!^ 

I'm 4 — Vm — €m,j • 

end for 

for i — 1 ,,m do 
€i, n 4- ri mod 2 
ri 4 ri € ijn 
ri 4 r»/2 

Cn 4 Cn 
end for 
C-n 4— C n /2. 


e 4— a R * o e (Apply permutation to rows) 
e 4— 0^7 1 ° 6 (Apply permutation to columns) 
t<-t + 2 b e. 

end for 
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Proposition 2.9. Algorithm 2 requires on average 0(m 2 n logM) random bits, 
with 0(mn 2 M log 2 M) arithmetic operations. 

Proof. By Lemma 2.8, each entry is rejected at most an expected 0(m) number 
of times. The rejection function needs to be computed at worst 0{mn log M) 
times, with a cost of 0(nM log M) for performing an n-fold convolution via 
fast Fourier transforms. □ 


2.3. 2 X n tables 


For a more presently practical exact sampling algorithm, we consider the ap¬ 
plication of PDC to the case of 2 x n tables. Like rejection sampling, PDC 
is adaptable, with almost limitless possibilities for its application, and in this 
particular case there is a simple division which yields a simple and practical 
algorithm. 

When there are only two rows, the distribution of Ci ,j given + £2 ,j = Cj is 

uniform on {0,..., Cj }, so we avoid the geometric distribution altogether. This 
yields the following simple algorithm, which does not require the computation 
of any rejection functions nor a lookup table. 


Algorithm 3 generating a uniformly random 2 x n (r, c)-contingency table. 

1: 

for j = 1,..., n — 1 do 


2: 

choose uniformly from {0,.. 

.,Cj} 

3: 

let X 2 j = Cj — x\j 


4: 

end for 


5: 

let xi, n « n — YfjZ i x i j 


6: 

let x 2 ,n =r 2 - YfjZl x 2,j 


7: 

if xi, n < 0 or X 2 ,n < 0 then 


8: 

restart from Line 1 


9: 

end if 


10: 

return x 



To summarize Algorithm 3: sample entries in the top row one at a time, except 
fi.n, uniformly between 0 and the corresponding column sum Cj. The rest of the 
table is then determined by these entries and the prescribed sums; as long as all 
entries produced in this way are non-negative, we accept the result to produce 
a uniformly random (r, c)-contingency table. 

Theorem 2.10. Let U\, U 2 ,..., C7„_ 1 denote independent uniform random vari¬ 
ables, with Uj uniform over the set of integers {0, 1 ,..., cj}, j = 1 ,..., n — 1 , 
and define 

p n := P(Ui + ... + U n - 1 e [ri - c„,ri]). 

Algorithm 3 produces a uniformly random 2 x n {r,c)~contingency table, with 
expected number of rejections 0{\/p n ). 
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Corollary 2.11. When the row sums are equal and the column sums are equal, 
the expected number of rejections before Algorithm 3 terminates is 0(n 1 / 2 ). 

There is a key observation at this point, which is that exponential random 
variables share the same property that given +^ 2 ,j = Cj is uniform over 
the interval [0, Cj]. The algorithm for real-valued 2 x n tables is presented below 
in Algorithm 4, and is essentially the same as Algorithm 3. 


Algorithm 4 generating a uniformly random 2 x n real-valued (r, c)- 
contingency table. 

1 : 

for j = 1,. .., 

n — 1 do 

2: 

choose xij 

uniformly from [0,Cj] 

3: 

let X 2 j = c 

j ~ %l,j 

4: 

end for 


5: 

let x\ : „ = r i - 

-E?=isip 

6: 

let X2,n = T 2 - 


7: 

if x\ n < 0 or 

%2 n < 0 then 

8: 

restart from Line 1 

9: 

end if 


10 : 

return x 



2.4. Approximate sampling of real-valued, tables 


By Lemma 3.5, a uniformly random (r, c)-contingency table with real-valued 
entries has distribution 

(£1 Er,c), 

where £ = (Gj ), 1 < * < m, 1 < j < n, is an m x n matrix of independent expo¬ 
nential random variables with parameter A,;y = — log(l — ptj). An exponential 
random variable £'(A,;j) can be decomposed into a sum 


+ G(pij ), 

where A(A) is a random variable with density 

/( x) = \e~ Xx (l — e -A ) -1 , 0 < x < 1, 

and G{jp) is a geometric random variable with probability of success at each trial 
given by p , independent of A(A). 

One could attempt to adapt the bit-by-bit rejection of Algorithm 1 in the con¬ 
tinuous setting to fractional parts, however, instead of counting the number of 
residual tables, one would have to look at the density of such tables with respect 
to Lebesgue measure in R mxra . Instead, we champion an approximate sampling 
algorithm analogous to Algorithm 2 which circumvents such calculations. 

For each 1 < i < to , 1 < j < n, let Yj j (A,j) denote independent exponential 
random variables with parameter A,; j, and let YG (A ij ) denote a random variable 
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with distribution 


'A,/)) 




El — 


In what follows we will take A ij = — log y J , i.e., A ij does not vary with 
parameter i, and let q-ij = TO ? C . Let [FA , S J denote a random variable with 
distribution 





s 

El £i,j(Qi,j) 
2=1 


E Y i,jAi,j) — C J 

»=s+l 


A suggested rejection function is then 


G{i,j,x,r, c ) 

:=p ^E^(^) + 2 L^a( a m)J 


n 

+ E Y iA^) e 

*=j+l 



X 



El ^ d{cj 

i=i+l 



for a: € [0,1], 


In Algorithm 5 below, we let FractionExp(A) denote the distribution of the 
fractional part of an exponential random variable with parameter A. 
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Algorithm 5 Generation of fractional parts of uniformly random (r, cereal- 
valued table 


1: Let gr denote any permutation such that gr o r is in increasing order. 
2: Let gq denote any permutation such that gq o c is in increasing order. 
3: r i grot. 

4: c 4 gq o c. 

5: for j = 1,..., n — 1 do 
6 : for z — 1,..., m — 1 do 

7: 

8 : ei j <— FractionExp(Aj) 

9: if U > - G(i,j, U j ,r,c) - then 

max a:£[0,l] G (*>3>*> r >c) 

goto Line 8. 
end if 

m-n — £i,j. 

C -j ^ Cn €j 7 ■ 


10 : 

11 : 

12 : 

13: 

14: 

15: 

16: 

17: 

18: 

19: 

20 : 

21 : 

22 : 

23: 

24: 

25: 

26: 


end for 






m,j • 
m,j • 


c j c j ~ 6 
r m 4 r m C- 

end for 

for i = 1 , ..., m do 
£i,» f- (M 

Cn ^ Cn €j,n 

end for 

e <— crjf 1 o e (Apply permutation to rows) 
e <— 1 o e (Apply permutation to columns) 

return e. 


Algorithm 6 Generation of approximately uniform random real-valued (r, c)- 
table_ 

1: r< output of Algorithm 5 with (r, c) input. 

2: r\ <- r< - Z)™=i £i,j, i = 1, ... ,m. 

3: c'. -S- Cj - Eiil e i,j, J = b • • •. n- 
4 : t 4— output of Algorithm 2 with (r', c') input. 

5: Return t + e. 


3. Background 

5 .1. Contingency Tables 

Definition 3.1. Let r = (ri,..., r m ) and c = (ci,... ,c n ) be vectors of non¬ 
negative integers, with r\ + • • • + r m = c± + ■ ■ ■ +c n . An (r,c)~contingency table is 
an to x n matrix £ = (t;ij)i<i< m ,i<j<n with non-negative integer entries, whose 
row sums = y~b ^ and column sums Cj = JA are prescribed by r and 
c. Let A = yb r* = JA c j be the sum of all entries. We denote the set of all 
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(r, c)-contingency tables by the set 



ELi €i,e = r i V 1 < i < m, 
Efei = Cj V 1 < j < n 


The set E is also known as the transportation polytope, see for example [6], and 
the measure of interest is typically the uniform measure over the set of integral 
points. 

We shall also consider real-valued tables with real-valued row sums and column 
sums, in which case the set E is described by 



Since this set has infinite cardinality, we instead consider the density of the set 
with respect to Lebesgue measure on R. mxn . 

3.2. Rejection Sampling 

A random contingency table can be described as the joint distribution of a col¬ 
lection of independent random variables which satisfy a condition. Many com¬ 
binatorial structures follow a similar paradigm, see for example [31, 2] and the 
references therein. Let X = [X denote a collection of indepen¬ 
dent random variables, forming the entries in the table. Given a collection of row 
sums r = (r\,..., r m ) and column sums c = (ci,..., c n ), the random variable 
X ; , with distribution 


(X') X m ,„) | E), 


(13) 


is then representative of some measure over the set E. A general framework for 
the random sampling of joint distributions of this form is contained in [25] . 

When the set E has positive measure, the simplest approach to random sampling 
of points from the distribution (13) is to sample from (X) repeatedly until 
X £ E\ this is a special case of rejection sampling [49], see also [26], which we 
refer to as hard rejection sampling. The number of times we must repeat the 
sampling of (X) is geometrically distributed with expected value P(X £ E)~ x , 
which may be prohibitively large. 

Beyond hard rejection sampling, one must typically exploit some special struc¬ 
ture in (X) or E in order to improve upon the number of repetitions. Indeed, for 
contingency tables, we can easily improve upon the naive hard rejection sam¬ 
pling of the entire table by applying hard rejection sampling to each column inde¬ 
pendently, with total expected number of repetitions Ej(^(Et=i = c j)) -1 ; 
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then, after each column has been accepted, we accept the entire set of columns 
if every row condition is satisfied. This is the approach championed in Section 6 
for more general distributions on the entries. 

Finally, we note that hard rejection sampling fails when P(X £ E) = 0; in this 
case, the target set of interest typically lies on a lower-dimensional subspace of 
the sample space, and it is not apparent how one could generally adapt hard re¬ 
jection sampling. In terms of contingency tables, we may wish to sample random 
real-valued points which satisfy the conditions; if the sums of random variables 
have densities, the conditioning is well-defined, even though the probability of 
generating a point inside the table is 0; see [25]. 


3.3. Uniform Sampling 

We now summarize some properties that we utilize to prove our main results. 

Lemma 3.1. Suppose X = (X,j)i<j< mi i<j< n are independent geometric ran¬ 
dom variables with parameters pij. If Pij has the form pij = 1 — oti/3j, then X 
is uniform restricted to (r,c)~ contingency tables. 

Proof. For any (r, c)-contingency table £, 

P(X = £) = l[r(x ZJ = &) = l{(n,.-g-d n,.-y - IK II s; n (!-«*& 

i,j '•./ i J ij 

Since this probability does not depend on £, it follows that the restriction of X 
to (r, c)-contingency tables is uniform. □ 

For j = 1,..., n, let Cj = (Cjy,..., C m j ) be independent random vectors with 
distribution given by (Xij ,..., X m j ) conditional on X/j = cj\ that is, 

mtr< _ (<- e W _ = • ' • i Xmj = imj) 

- Ky ,.. •, M) - P (E i x ij = c j ) 

for all non-negative integer vectors with = Cj, and 0 otherwise. 

Lemma 3.2. The conditional distribution of C = (Ci,..., C n ) given y] . Cij = 
ri for all i is that of a uniformly random (r, c)-contingency table. 

Proof. For any (r, c)-contingency table £, P (C = f) is a constant multiple of 

P(x = C)- □ 

Our next lemma shows how to select the parameters p^ in order to optimize 
the probability of generating a point inside of E. It also demonstrates a key 
difficulty in the sampling of points inside E, which is that we can only tune the 
probabilities pij to optimally target rows sums or column sums, but not both 
simultaneously. 
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Lemma 3.3. Suppose X is a table of independent geometric random variables, 
where Xij has parameter ptj = m/(m + Cj), 1 < i < m, 1 < j < n. Then the 
expected columns sums of X are c, and the expected row sums of X are N/m. 


Proof. For any j = 1,..., n, 

m 

i=1 

Similarly, for any i = 1,..., to, 

n 

y ® [ x ij\ 

i=i 

Remark 3.4. Note that entries in different rows (columns, resp.) are condi¬ 
tionally independent. This means that we can separately sample all of the rows 
(columns, resp.) independently until all of the row (column, resp.) conditions are 
satisfied. It also means that once all but one column (row, resp.) are sampled ac¬ 
cording to the appropriate conditional distribution, the remaining column (row, 
resp.) is uniquely determined by the constraints. 

For real-valued tables, the calculations are analogous and straightforward. 

Lemma 3.5. Suppose X = (X i j)i<j< m l <j< 7l are independent exponential ran¬ 
dom variables with parameters Ajj := — log(l — Pij ). If Pij has the form pij = 
1 — otiffj, then X is uniform restricted to real-valued (r, c)-contingency tables. 


E 


m + Cj 


m 


1 ) = Cj. 


E 

3 =1 


m + c, 


-1 = 


N 

m 


□ 


Proof. For any real-valued (r, c)-contingency table £, 

P(X e df) = n p ( x ^ e d ^) = II A M e~ Xi ^ 
i,j i,j 

- II(Ai,i)II ('cSjr - II v; \\^ n(\i)• 

i,j i,j i .1 '-j 

Since this probability does not depend on £, it follows that the restriction of X 
to real-valued (r, c)-contingency tables is uniform. □ 

Lemma 3.6. Suppose X is a table of independent exponential random variables, 
where X, j has parameter = — log (1 — Pij), 1 < * < m, 1 < j < n. Then the 
expected columns sums of X are c, and the expected row sums of X are N/m. 

Remark 3.7. Remark 3.4 applies when the Xij’s are independent and have any 
discrete distribution, not just geometric. If the Xij’s are independent continu¬ 
ous random variables such that each of the sums ftiLi Xi,j for j = 1,2,... ,n 
and Yft, Xij for i = 1,2,... ,m has a density, then the same conditional inde¬ 
pendence of entries in different rows are conditionally independent, and we can 
separately sample all of the rows independently until all of the row conditions 
are satisfied. 
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3 . 4 . Probabilistic Divide-and-Conquer 

The main utility of PDC is that when the conditioning event has positive prob¬ 
ability, it can be applied to any collection of random variables. When the con¬ 
ditioning event has probability 0, some care must be taken, although in our 
current setting we can apply [25, Lemma 2.1, Lemma 2.2] directly. 

We now recall the main PDC lemma, which is simplest to state in terms of 
a target event E of positive probability. Suppose that A and B are sets, and 
let E be a subset of A x B. Let A and B be probability measures on A and 
B, respectively. The goal of PDC is to provide a technique to sample from the 
distribution of the cartesian product of A and B restricted to E. 


Theorem 3.8 (Probabilistic Divide-and-Conquer [1]). For each a € A, let 
B a = {b £ B : (a, b ) £ E}, where P((A, B) £ E) > 0. Let 

(A f ) := ( A\(A,B)£E ), 

(B' a ) := (B\B a ). 

Then, (A',B' a ,) = (( A, B) \ E). 


A similar theorem holds when P((A, B) £ E) = 0, under some simple conditions. 


Theorem 3.9 (Probabilistic Divide-and-Conquer for certain events of proba¬ 
bility 0 [25]). For each a £ A, let B a = {b £ B : (a, b) £ E}, where P((A, B) £ 
E) = 0. Suppose there is a random variable T with a density, and k £ range(T) 
such that P((A, B) £ E) = P(T = k). Suppose further that for each a £ A, there 
is a random variable T a , either discrete or with a density, and k £ range(T a ), 
such that P (B a = b) = P(T a = k). Let 


(A') := (A | ( A,B)£E ), 
{B' a ) := (B j B a )- 


Then, (A',B' a ,) = (( A,B) \ E). 


Our recommended approach to sample from ( A') is rejection sampling. For any 
measurable function p : S —> [0,1], let (Z | U < p{Z)) denote the first coordinate 
of (( Z,U) | U < p(Z)), where U is a uniform random variable on [0,1] inde¬ 
pendent of Z. This allows concise descriptions of distributions resulting from 
rejection sampling. 

Lemma 3.10 ([25]). Under the assumptions of Theorem 3.8 or Theorem 3.9, 
pick any finite constant C > sup^. B(B X ). We have 



That is, to sample from ( A'), we first sample from (A), and then reject with 
probability 1 — B(B a )/C. 
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In our applications of PDC that follow, we indicate the use of PDC by specifying 
distributions (A) and ( B) and an event E such that the measure of interest is 
((A,B)\(A,B)gE). 


4. Proof of uniformity 


Lemma 4.1. Algorithm 1 produces a uniformly random (r, c)-contingency table. 


Proof. We need to show that the function / in Section 2.1 samples €ij in its 
right proportion, namely, 

\E , (e£, 1)^=1,.. ,,m) ■ ■ • 1 — l )b=l(/£,j )^=l,...,j — l) 5 

i.e., that we are getting the correct distribution by filling in the bits one at a 
time. 


Let us start by considering the random sampling of the least significant bit, say 
€ 1 , 1 , of the top left entry of the table. Suppose we sample from 


(eu) = ( Bern (i+il))' 

and then reject according to the correct proportion in the conditional distribu¬ 
tion. We accept this sample in proportion to P(_E|ei,i), which is given by 

/ (n -k ,...), \ 

F(E\e hl = k) = E ( Cl - fc,...), \-q^- k (l-q 2 1 )(l-q 1 ) m - 1 l[q c /(l-q J ) m . 

V 01,1 J j =2 

Normalizing by all terms which do not depend on k gives 

( (n —k,...), \ 

(ci - k ,...), ] • qf k , 

Si.1 / 


and since P(ei.i = k) oc Equation (3) follows for the case i = j = 1. The 
same reasoning applies for 1 < i < m — 2 and 1 < j < n — 2. 

Next, consider the case when we sample from (e m _i j) = ^Bern ^ ^ ^ ; j = 

1,2,..., n — 1. We accept this sample in proportion to P(£jei,i,..., 

Note, however, that given the least significant bits of the first m — 1 entires of 
any given column, the bit e m j is determined by the column sum. Let k := e m -ij 
and k' := e m j be such that Cj — k — k’ is even. Applying PDC deterministic 
second half, then implies the rejection function is proportional to 

( (•••j fm—i r m k), \ t 

(..., Cj -k- k 1 , ...), • q C /- k ~ k (1 - 9 |) m - 2 (1- 

Omj J 
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Since P(e OT _ij = k) oc qj, Equation (5) follows. 

The other cases are similar, and simply require keeping track of which other bits 
are determined under the conditioning. We have thus demonstrated the ability 
to exactly sample from the least significant bit of each entry of the table. 

We now proceed by induction. Let eo denote the least significant bits of all en¬ 
tries in the table, with e,, i = 1,2,... denoting the (i+ l)th least significant bits 
of the table. The first iteration of the algorithm generates an element from the 
distribution (eo | E rtC ), and updates the values of r and c to r' and c', respec¬ 
tively, using the values in eo- The second iteration of the algorithm generates an 
element from the distribution (eg | E r i tC i), which is distributed as (ei | E rc , eo). 
By Theorem 3.8, the expression for t in Line 32 is distributed as (eo + 2ei | E r , c ). 

Assume that after the first b— 1 iterations of the algorithm, the expression for t in 
Line 32 is distributed as (eo + 2ei + ... 2 h_1 eb-i \ E riC ), and that r' is the vector of 
remaining row sums, and c' is the vector of remaining column sums given the first 
b— 1 bits of the entries of the table. Then, the &-th iteration of the algorithm gen¬ 
erates (eg | E r ' jC i) which is distributed as (eb | E r ^ c . eo,..., eb_i). By Theorem 3.8, 
the expression for t in Line 32 is thus distributed as (eo + 2ei + ... 2 b eb | E r , c ). 

After at most [log 2 (M)J + 1 iterations, where M is the largest row sum or 
column sum, all row sums and column sums will be zero, at which point the 
algorithm terminates and returns the current table t. □ 


5. 2 by n tables 


In this section, N denotes the sum of all entries in the table. 

Dyer and Greenhill [32] described a 0(n 2 log N) asymptotically uniform MCMC 
algorithm based on updating a 2 x 2 submatrix at each step. Kijima and Mat- 
sui [38] adapted the same chain using coupling from the past to obtain an exactly 
uniform sampling algorithm at the cost of an increased run time of 0(n 3 log N). 
In this section, we will show that Algorithm 3, which is also exactly uniform 
sampling, runs with an expected 0(1/p n ) number of rejections, where p n is a 
certain probability that a sum of independent uniform random variables lie in 
a certain region. 


Proof of Theorem 2.10. Let £ be a 2 x n (r, c)-contingency table, and let £' = 

(£iij£i2 ,..., Ci,n_i); that is, the top row without the last entry. Since r and c 

are fixed, there is a bijective relationship between £ and £'; each determines the 
other. Then, 


P[a; = £] = P[arn = £n, *12 = £ 12 , • • •, aq.n-i = £i,n-i] 


1 

(ci + l)(c2 +!)••• (c„_i + 1) 


This does not depend on so x is uniform restricted to (r, c)-contingency tabes. 
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The first row of entries x±i, ..., Xi jn _i is accepted if and only if X\ n = rq — 
Xll — • • • — xi n —i lies between 0 and c n , which occurs with probability p n = 
+ ■ • ■ + U n -1 £ [rq — c n ,7q]). □ 

Theorem 5.1. Let U\, U 2 , ■ • ■, U n -1 denote independent uniform random vari¬ 
ables, with Uj uniform over the continuous interval [0, cf\, j = 1,..., n — 1, and 
define 

p n := P(f7i + ... + U n -i £ [rq - Cn, ri]). 

Algorithm j produces a uniformly random 2 x n real-valued (r,c)~ contingency 
table, with expected number of rejections 0{l/p n ). 

Corollary 5.2. When the row sums are equal and the column sums are equal, 
the expected number of rejections before Algorithm j terminates is Ofn 1 / 2 ). 

We observe that the algorithm runs quickly when rq ~ E[C/i + • —b U n ] = N/2, 
i.e., the two row sums are similar in size, and also when c n is large. Since we 
can arbitrarily reorder the columns and choose c n to be the largest column 
sum, it follows that having a skewed distribution of column sums and an even 
distribution of row sums is advantageous to runtime. 

Denote by <£>(•) the cumulative distribution function of the standard normal 
distribution. 

Corollary 5.3. Suppose U\ + ... + U n -\ satisfies the central limit theorem. 
Assume there exists some t £ R such that, as n —» 00 , we have 


r l-Cn- 


ci + ...+e„i 


=1+- 


/ c 2 +...c 2 

Let c' n = c n /y —— 12 " -1 . Then, asymptotically as n —> 00 , the expected number 
of rejections before Algorithm 3 terminates is 0($(t + c' n ) — $(£))■ 


Proof. Letting Z denote a standard normal random variable, we have 

P([/i + 1 £ [rq — c„, rq]) ~ P(Z £ [t,t + c' n }). □ 

Corollary 5.4. Suppose U\ + .. . + U n —i satisfies the central limit theorem. Sup¬ 
pose c' n —y A £ (0, 00 ]. Then the expected number of rejections before Algorithm 3 
terminates is 0(1). 

Corollary 5.4 says that when the square of the largest column sum dominates 
the sum of squares of the remaining column sums, then the majority of the 
uncertainty is in column 1, which is handled optimally by PDC. 

Proof of Corollary 5.2. In this case, we have c' x = c/y/c 2 (n — 1)/12 = 0(l/y / ri), 
hence we have the acceptance probability p n = 0(1/ y/n). □ 
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The following table demonstrates that under the conditions of Corollary 5.2, i.e., 
equal row sums and columns sums, the expected number of rejections grows like 
0(i/n), and the expected runtime grows like 0(n 3 / 2 log N). 


Rows 

Columns 

Density 

Rejections 

Runtime 

2 

2 

5 

0* 

269ns* 

2 

10 

5 

1.32* 

1.13ns* 

2 

100 

5 

6.22* 

23.0|rs* 

2 

1000 

5 

21.6+ 

665|ls+ 

2 

10 4 

5 

69.3+ 

19.7ms+ 

2 

10 5 

5 

199+ 

580ms+ 

2 

10® 

5 

755+ 

23.7s+ 


Table 1 


Simluated runtime under Algorithm j for sampling contingency tables with homogeneous 
row and column sums, compared to optimised rejection sampling where columns are picked 
using a discrete uniform random variable. The symbols *, f and | denote averages over a 
sample of size 10®, 1000 and 1 respectively. As predicted analytically, the number of 
rejections grows as 0(-/n) while the runtime grows as 0(ra 3 / 2 log N). 


6. Other Tables 


One can more generally sample from a table having independent entries with 
marginal distributions (Xyi), ( X\ (X mj „), i.e., 

(X) = ((X i , j )i< i < m ,i< J -< Tl | E). (14) 

If the rejection probabilities can be computed, then we can apply a variation of 
Algorithm 1, and possibly also a variation of Algorithm 6. 

It is sometimes possible to circumvent the column-rejection probabilities. We 
now state Algorithm 7, which is a general procedure of independent interest 
that samples from a conditional distribution of the form 

n 

E x ' = fc 

i =1 

see Lemma 6.1 for the explicit form of the rejection probability t(a). 
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Algorithm 7 Random generation from ((Xi,X 2 , ■ ■ ■, X n )| E” = i Xi = k) 

1: Assume: (Ai), (A 2 ),..., (An) are independent and either all discrete with P(X]ILi A* = 
k) > 0, or ( 527=1 A») has a bounded density with /^n_ Xi{k) > 0. 

2: if n = 1 then 
3: return /c 

4: end if 

5: let r be any value in {1,..., n}. 

6: for i — 1,..., r do 
7: generate A i from (A*). 

8: end for 

9: let a = (Ai,..., A r ) 

10: let s = 52i= 1 Aj. 

11: if U > t(a) (See Lemma 6.1) then 
12: restart 

13: else 

14: Recursively call Algorithm 7 on /^(Ar+i),... , £(A n ), with target sum k — s, and set 

(A r +i,..., A n ) equal to the return value. 

15: return (Ai,...,A n ). 

16: end if 


Lemma 6.1. Suppose for each a 1 b , £ {1,..., n}, a <b, either 

1. (y^_ n Xj^J is discrete and 

, P (E?=r+1 * = * ~ ELl *1*1, • • ■ , *) 
W max I) P(E” =r+1 ^=^) 

or 

2. (E- = a *) ^ as a bounded density, denoted by f ay b, and 

i(a) = /r+l,n(fc-EL^I^lv.^) 
max,, f r+ i tTl {rf) 


(15) 


(16) 


TTien Algorithm 7 samples from ((Xi, X 2 ,..., X n )| E"=i * = fc). 

Proof. The rejection probability t(a) is defined, depending on the setting, by 
Equation (15) or Equation (16), so that once the algorithm passes the rejection 
step in Line 11, then for any 1 < b < n, the vector (Xi,..., X&) has distribution 
(A\h(A, B) = 1), where A = (Xi,..., X&) and h(A, B) = 1(EILi * = &)• Let 
a denote the observed value in this stage. 

We now use induction on n. When n = 1, we take A = (Xi) and B = 0, then 
Algorithm 7 with input (£(Xi),fc) returns the value of the input target sum k 
for any k £ range(Xi), which has distribution £(Xi|Xi = k). 

Assume, for all 1 < b < n, Algorithm 7 with input (£(Xj,+i),..., £(X n ), l) 
returns a sample from £(Xb+i,... X n \h(a, B) = £), for any £ £ range(E" = i Xf ); 
i.e., it returns a sample from {B\h{a, B) = 1), say b , where B = (Xb+i,..., X n ). 
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Hence, each time Algorithm 7 is called, it first generates a sample from distribu¬ 
tion (A\h(A, B) = 1), and then the return value of the recursive call in Line 15 
returns a sample from ( B\h{a , B) = 1). By Lemma 3.10, (a, b ) is a sample from 
C((A,B)\h(A,B) = l). □ 

In the case where computing the row rejection probabilities after the generation 
of each column is not practical, we recommend independent sampling of columns 
1,..., n— 1 all at once, with a single application of PDC deterministic second half 
for the generation of the final column. This approach is more widely applicable, 
as it requires very little information about the marginal distribution of each 
entry. 

Let X' in denote the random variable with distribution Xi tU ). In the 

following algorithm, the function used for the rejection probability, when X- n 
is discrete, is given by 


Mb {X' i n ), k) = P( X' i n = r* - k), 

and when X[ n is continuous with bounded density fx>. , is given by 
Mb {X' i>n ),k) = fx' in [ r i - k). 

Thus, for Algorithm 8 below, we simply need to be able to compute the distri¬ 
bution ( X[ n ) and find its mode, for i = 1, 2 ,.. ., m. 


Algorithm 8 Generating a random variate from (A) specified in Equation (14). 
1: for j = 1,..., n — 1 do 

2: let Cj denote the return value of Algorithm 7 using input ((Xij), ..., (Xmj), Cj). 

3: end for 

4: let x it „ =ri- Xij for i = 1,..., m. 

5: if U > UT-i — ~ hArv' Tn restart from Line 1 

6: return x 


Proposition 6.2. Algorithm 8 samples points according to the distribution 
in (14). 

The proof follows analogously to Lemma 4.1, and uses the conditional indepen¬ 
dence of the rows given the column sums are satisfied. 

In the most general case when even the columns cannot be simulated using 
Algorithm 7 or another variant, we apply PDC deterministic second half to 
both the columns and the rows, which simply demands in the continuous case 
that there is at least one random variable with a bounded density per column 
(resp., row). In Algorithm 9 below, each column has a rejection function tj, 
which is either the normalization of the probability mass function 

^ ~ a ) 

3 max^P (Xijj=£) 
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or the normalization of the probability density function 


fXijJ ( Q ) 

SUP {t)‘ 


There is also a row rejection function Sj. Let XX denote the random variable 
with distribution When {X[ ■) is discrete, we have 


Si (a) 


nx'ij = q ) 

maxf P(Xl )i = £) 


and when {XX) is continuous, we have 


Si (a) 


sup^ fx [_. (0 ‘ 


Algorithm 9 Generating ((X^i, Xi ^,..., X min ) | E) 

1: let / ( {1..... n } denote a column. 

2: for j e {1, 2,..., n} \ {j'} do 

3: let ij E {1,..., m} denote a row in the j-th column 

4: for i = {1, 2,..., m} \ {ij} do 

5: generate x%j from ( Xij ) 

6: end for 

T: let — Cj ^ij 

8: with probability 1 — tj(xi^j) restart from Line 2 

9: end for 

10: for i = 1,..., m do 

11: let Xij! mn- Xij 

12: with probability 1 — Si(xj i j/) restart from Line 2 

13: if x^/ < 0 restart from Line 1 

14: end for 
15: return x 


Proposition 6.3. Algorithm 9 samples points according to the distribution 
in (14). 
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